Long-term spectro-temporal information for improved automatic speech emotion classification
نویسندگان
چکیده
This paper investigates the contribution of features which convey long-term spectro-temporal (ST) information for the purpose of automatic emotional speech classification. The ST representation is obtained by means of a modulation filterbank decomposition of long-term temporal envelopes of the outputs of a gammatone filterbank. The two-dimensional discrete cosine transform is used to reduce the dimensionality of the representation; candidate features are then derived from statistics computed from the DCT coefficients. Sequential forward feature selection is used to select the most salient features. Two types of experiments are described which use the Berlin emotional speech database to test the performance of the ST features alone and in combination with prosodic features. In a multi-class experiment, simulation results with a support vector classifier show that a 44% reduction in classification error is attained once prosodic features are combined with the proposed ST features. Additionally, in a one-against-all experiment, an average increase in F-score of 33% is attained when the proposed ST features are included.
منابع مشابه
Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملAutomatic speech emotion recognition using modulation spectral features
In this study, modulation spectral features (MSFs) are proposed for the automatic recognition of human affective information from speech. The features are extracted from an auditory-inspired long-term spectro-temporal representation. Obtained using an auditory filterbank and a modulation filterbank for speech analysis, the representation captures both acoustic frequency and temporal modulation ...
متن کاملRecognition of Human Emotion in Speech Using Modulation Spectral Features and Support Vector Machines
Automatic recognition of human emotion in speech aims at recognizing the underlying emotional state of a speaker from the speech signal. The area has received rapidly increasing research interest over the past few years. However, designing powerful spectral features for high-performance speech emotion recognition (SER) remains an open challenge. Most spectral features employed in current SER te...
متن کاملSpectro-temporal modulations for robust speech emotion recognition
Speech emotion recognition is mostly considered in clean speech. In this paper, joint spectro-temporal features (RS features) are extracted from an auditory model and are applied to detect the emotion status of noisy speech. The noisy speech is derived from the Berlin Emotional Speech database with added white and babble noises under various SNR levels. The clean train/noisy test scenario is in...
متن کامل